Impact of Corpus Diversity and Complexity on NER Performance
نویسندگان
چکیده
We describe a cross-corpora evaluation of disease mention recognition for two annotated biomedical corpora: the Human Variome Project Corpus and the Arizona Disease Corpus. Our analysis of the performance of a state-of-the-art NER tool in terms of the characteristics and annotation schema of these corpora shows that these factors significantly affect performance.
منابع مشابه
پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملGenerating Chinese Named Entity Data from a Parallel Corpus
Annotating Named Entity Recognition (NER) training corpora is a costly process but necessary for supervised NER systems. This paper presents an approach to generate large-scale Chinese NER training data from an EnglishChinese discourse level aligned parallel corpus. Difficulty of NER is different among languages due to their unique features. For example, the performance of English NER systems i...
متن کاملسیستم شناسایی و طبقه بندی اسامی در متون فارسی
Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...
متن کاملPAYMA: A Tagged Corpus of Persian Named Entities
The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...
متن کاملThe Impact of Top Management Team Diversity on Company Performance
This study aims to examine the impact of diversity among the members of top management team (TMT) on financial performance of manufacturing firms. Diversity components of the top management team were divided into two categories of managerial background variables (age, gender, background, education, work skills), and managerial areas (industrial and international). In addition, financial perform...
متن کامل